{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Adaptive sampling\n",
    "by G. De Fabritiis\n",
    "<img src=https://www.acellera.com/wp-content/uploads/2014/05/Adaptive-sampling-spawning-tree.png width=\"800\"></img>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Concept\n",
    "* Exploration of the conformational space using MD simulations can waste lots of simulation time sampling the same conformational regions which does not provide any new knowledge.\n",
    "\n",
    "* Instead it would be desireable to explore more under-sampled regions of the conformational space, to overcome energetic barriers and eventually reach the desired conformation (folded protein / bound ligand etc.)\n",
    "\n",
    "* In adaptive sampling, instead of launching thousands of simulations at once from a small set of initial structures as in naive high-throughput MD, simulations are launched in sequential batches called epochs utilizing knowledge of the conformational space obtained from all previous epochs.\n",
    "\n",
    "* The starting points of the simulations in each epoch are chosen based on some criteria; in this case, it selects conformations from the most undersampled conformational regions detected in all previous simulations. This is done by using Markov state models which discretize the conformational space into a set of most metastable states. Then the starting conformations are sampled based on a distribution related to the population of each state.\n",
    "\n",
    "S. Doerr and G. De Fabritiis, [On-the-fly learning and sampling of ligand binding by high-throughput molecular simulations](http://pubs.acs.org/doi/abs/10.1021/ct400919u), J. Chem. Theory Comput. 10 (5), pp 2064–2069(2014).\n",
    "\n",
    "S. Doerr, M. J. Harvey, Frank Noé, and G. De Fabritiis, [HTMD: High-Throughput Molecular Dynamics for Molecular Discovery](http://pubs.acs.org/doi/abs/10.1021/acs.jctc.6b00049) J. Chem. Theory Comput. 2016 12 (4), 1845-1852\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Unit of execution\n",
    "\n",
    "Each simulation in adaptive is associated to a single directory which contains all files to run it. To run a project it is therefore necessary to provide one or more initial simulation directories, called generators."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to start\n",
    "\n",
    "Adaptive creates multiple directories which it uses to organize the simulations.\n",
    "The user only needs to provide a generators folder containing one subfolder for each starting conformation containing all files needed to execute that simulation. For example:\n",
    "\n",
    "```\n",
    "└── generators/\n",
    "    ├── gen1/\n",
    "    │   ├── structure.pdb\n",
    "    │   ├── input\n",
    "    │   └── ...\n",
    "    ├── gen2/\n",
    "    │   ├── structure.pdb\n",
    "    │   ├── input\n",
    "    │   └── ...\n",
    "```\n",
    "\n",
    "Then the adaptive will generate an `input`, `data` and later a `filtered` folder as well, looking like this:\n",
    "\n",
    "```\n",
    "├── data/          # Contains the completed simulations (automatically created)\n",
    "├── filtered/      # Contains the completed simulations without water atoms (automatically created)\n",
    "├── generators/    # Contains the initial generators provided by the user\n",
    "└── input/         # Contains the files needed to start all simulations of all epochs (automatically created)\n",
    "```\n",
    "\n",
    "Adaptive uses a naming scheme for simulations which follows the pattern: `e4s3_e2s1p1f45`. This name tells us that this simulation was generated in epoch 4 as the 3rd simulation of the batch. The starting conformation was taken from simulation 1 of epoch 2 from the first piece of the simulation\\* and from frame 45 of that simulation piece.\n",
    "\n",
    "\\* some MD software might fragment simulations into pieces. Usually though this number will be 1 and can be ignored."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simulation length\n",
    "1. The length of each simulation is really system dependent. \n",
    "1. It could be anything like tens of nanoseconds to hundred of nanoseconds. \n",
    "1. As a rule of thumb use twice the expected lag time for your molecular process (e.g. for binding anything between 30 and 100 ns)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simulation details\n",
    "As only the coordinates files are seeded for every new epoch, simulations cannot use a velocity file. Velocities are therefore reinitialized to the Maxwell Boltzmann distribution at the given temperature.\n",
    "\n",
    "E.g. if setting up the simulations with the `Production` class:\n",
    "\n",
    "```python\n",
    "from htmd.protocols.production_v6 import Production\n",
    "md = Production()\n",
    "md.adaptive = True\n",
    "[...]\n",
    "```\n",
    "or directly modifying the ACEMD `input` file of the simulations and removing the binvelocities line."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adaptive script example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from htmd import *\n",
    "app = LocalGPUQueue()\n",
    "app.datadir = './data'\n",
    "md = AdaptiveMD()\n",
    "md.nmin=5\n",
    "md.nmax=10\n",
    "md.nepochs = 30\n",
    "md.app = app\n",
    "md.projection = MetricDistance('name CA', '(resname BEN) and ((name C7) or (name C6))', metric='contacts')\n",
    "md.ticadim = 3\n",
    "md.updateperiod = 14400 # execute every 4 hours\n",
    "md.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Execution in a notebook \n",
    "\n",
    "1. It is possible to run the adaptive scheme syncronosly or asyncrounsly. \n",
    "1. The option `updateperiod` controls this behaviour. \n",
    "1. The default is to run and exit, so updateperiod needs to be specified if adaptive should be run synchronously"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setting a simple cron job\n",
    "\n",
    "1. This is useful for having the script execute automatically every x hours.\n",
    "1. Do not set `updateperiod` then, or set it to zero such that the program will execute and exit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```bash\n",
    "#!/bin/bash -login\n",
    "# cron.sh file\n",
    "# use crontab -e to add this line:\n",
    "# 0 */4 * * * cd /pathtomydir/; ./cron.sh\n",
    "#\n",
    "python conf.py\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Visualizing the starting conformations\n",
    "\n",
    "If we want to look at what structures were chosen as the starting conformations of a given epoch we can use a code snippet like the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "for s in glob('input/e28s*'):  # Visualize all starting conf of epoch 28\n",
    "   mol = Molecule(s+'/structure.pdb')\n",
    "   mol.read(s+'/input.coor')\n",
    "   mol.view()"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  },
  "nav_menu": {},
  "toc": {
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": true,
   "threshold": 6,
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}